16 research outputs found
A Practical Approach to Protect IoT Devices against Attacks and Compile Security Incident Datasets
open access articleThe Internet of Things (IoT) introduced the opportunity of remotely manipulating home appliances (such as heating systems, ovens, blinds, etc.) using computers and mobile devices. This idea fascinated people and originated a boom of IoT devices together with an increasing demand that was difficult to support. Many manufacturers quickly created hundreds of devices implementing functionalities but neglected some critical issues pertaining to device security. This oversight gave rise to the current situation where thousands of devices remain unpatched having many security issues that manufacturers cannot address after the devices have been produced and deployed. This article presents our novel research protecting IOT devices using Berkeley Packet Filters (BPFs) and evaluates our findings with the aid of our Filter.tlk tool, which is able to facilitate the development of BPF expressions that can be executed by GNU/Linux systems with a low impact on network packet throughput
Improving Pipelining Tools for Pre-processing Data
The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features
A Multiple Classifier System Identifies Novel Cannabinoid CB2 Receptor Ligands
open access articleDrugs have become an essential part of our lives due to their ability to improve people’s
health and quality of life. However, for many diseases, approved drugs are not yet available
or existing drugs have undesirable side effects, making the pharmaceutical industry strive to
discover new drugs and active compounds. The development of drugs is an expensive
process, which typically starts with the detection of candidate molecules (screening) for an
identified protein target. To this end, the use of high-performance screening techniques has
become a critical issue in order to palliate the high costs. Therefore, the popularity of
computer-based screening (often called virtual screening or in-silico screening) has rapidly
increased during the last decade. A wide variety of Machine Learning (ML) techniques has
been used in conjunction with chemical structure and physicochemical properties for
screening purposes including (i) simple classifiers, (ii) ensemble methods, and more recently
(iii) Multiple Classifier Systems (MCS). In this work, we apply an MCS for virtual screening
(D2-MCS) using circular fingerprints. We applied our technique to a dataset of cannabinoid
CB2 ligands obtained from the ChEMBL database. The HTS collection of Enamine
(1.834.362 compounds), was virtually screened to identify 48.432 potential active molecules
using D2-MCS. This list was subsequently clustered based on circular fingerprints and from
each cluster, the most active compound was maintained. From these, the top 60 were kept,
and 21 novel compounds were purchased. Experimental validation confirmed six highly
active hits (>50% displacement at 10 μM and subsequent Ki determination) and an
additional five medium active hits (>25% displacement at 10 μM). D2-MCS hence provided a
hit rate of 29% for highly active compounds and an overall hit rate of 52%
Improving pipelining tools for pre-processing data
The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of
pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming
languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.Agencia Estatal de Investigación | Ref. TIN2017-84658-C2-1-RXunta de Galicia | Ref. ED481D-2021/024Xunta de Galicia | Ref. ED431C2018/55-GR
A multiple classifier system identifies novel cannabinoid CB2 receptor ligands
Abstract
Drugs have become an essential part of our lives due to their ability to improve people’s health and quality of life. However, for many diseases, approved drugs are not yet available or existing drugs have undesirable side effects, making the pharmaceutical industry strive to discover new drugs and active compounds. The development of drugs is an expensive process, which typically starts with the detection of candidate molecules (screening) after a protein target has been identified. To this end, the use of high-performance screening techniques has become a critical issue in order to palliate the high costs. Therefore, the popularity of computer-based screening (often called virtual screening or in silico screening) has rapidly increased during the last decade. A wide variety of Machine Learning (ML) techniques has been used in conjunction with chemical structure and physicochemical properties for screening purposes including (i) simple classifiers, (ii) ensemble methods, and more recently (iii) Multiple Classifier Systems (MCS). Here, we apply an MCS for virtual screening (D2-MCS) using circular fingerprints. We applied our technique to a dataset of cannabinoid CB2 ligands obtained from the ChEMBL database. The HTS collection of Enamine (1,834,362 compounds), was virtually screened to identify 48,232 potential active molecules using D2-MCS. Identified molecules were ranked to select 21 promising novel compounds for in vitro evaluation. Experimental validation confirmed six highly active hits (> 50% displacement at 10 µM and subsequent Ki determination) and an additional five medium active hits (> 25% displacement at 10 µM). Hence, D2-MCS provided a hit rate of 29% for highly active compounds and an overall hit rate of 52%.Dutch Scientific Council | Ref. VENI 14410Xunta de Galicia | Ref. ED431C2018/55-GR
Model for optimising the execution of anti-spam filters
The establishment of the first interconnection between two remote hosts in 1969 originated the beginning of one of the most important technological phenomena of humanity, Internet. In fact, Internet has become an essential part of life for many people inhabiting the most industrialized nations, reaching a percentage of penetration during 2014 of 40% of the world population.
One of the reasons that propitiated the massive proliferation of Internet is attributable to the e-mail. This service allows an easy and fast (nearly instantaneous) communication between users by sending messages. This fact has meant that e-mail service acquired a surprising popularity. However, the uncontrolled nature of Internet has turned e-mail communications into the best framework for the promotion of illegal advertisements (such as those about drugs selling), the delivery of phishing e-mails, the virus propagation and other forms of electronic scam (also called spam).
Although the amount of spam e-mail deliveries undergoes continuous fluctuations, current statistics show that more than 60% of the e-mail transferred through Internet are spam. This spam ratio is supported by newest communication advances such as 4G new generation networks, ensures a quick an easy Internet connection almost everywhere.
Under these circunstances, the use of spam filtering services and products is the most effective mechanism to fight against spam. However, the massive amount of e-mail deliveries per day (an average of 125 billion in 2015) has encouraged the need of improving spam filtering services in order to adapt them to the current needs.
In this research work, is introduce a new filtering model able to enhance speed and accuracy while maintaining the same philosophy and anti-spam techniques used in the most popular anti-spam filtering systems. This goal has been achieved through improving several aspects including: (i) the design and development of small technical improvements to enhance overall filter throughput, (ii) the application of genetic algorithms in order to enhance the filter accuracy and finally, (iii) the use of scheduling algorithms to increase speed filtering.Durante la última década, Internet se convirtió en una herramienta esencial para la comunicación entre personas. Las ventajas introducidas por Internet fueron rápidamente aprovechadas por millones de usuarios de la red para hacer realidad servicios como el comercio electrónico, la banca online, las redes sociales, etc. El aprovechamiento de este entorno también fue perseguido por aquellos que desean hacer uso de las novas tecnologÃas para comercializar productos ilegales o de dudosa reputación, o publicar/enviar contenidos molestos para los usuarios de la red. AsÃ, aparecieron los spammers y los contenidos SPAM que ya se extienden por las redes sociales, correo electrónico, foros, blogs, etc.
Para filtrar y eliminar los contenidos SPAM es necesario contar con software o servicios que permitan su detección. En la actualidad, la eliminación de contenidos antispam se distribuye como un servicio. Actualmente resulta habitual y efectiva la contratación de servicios de filtrado antispam que se componen de un software o hardware especÃfico de filtrado y de un servicio de actualización del comportamiento del filtro que permite la adaptación a las variaciones que se pueden producir en los correos distribuidos.
En la actualidad, estos servicios de filtrado se basan en la utilización de un software SpamAssassin que, por sus caracterÃsticas, permite el modelado del comportamiento del filtro de forma dinámica y la distribución de estos filtros al software de filtrado instalado en los clientes. La posibilidad de modelar los filtros de contenidos fue, sin duda la caracterÃstica más valorada de SpamAssassin y que motivó a que esta solución fuera adoptada incluso por grandes empresas como Symantec (Symantec Brightmail) ou McAfee (McAfee SpamKiller).Durante a última década, Internet converteuse nunha ferramenta esencial para a comunicación entre persoas. As vantaxes introducidas por Internet foron rápidamente aproveitadas por milleiros de usuarios da rede para facer realidade servizos como o comercio electrónico, a banca online, as redes sociais, etc. O aproveitamento deste entorno tamén foi perseguido por aqueles que desexaron facer uso das novas tecnoloxÃas para comercializar productos ilegais ou de dudosa reputación ou publicar/enviar contidos molestos para os usuarios da rede. AsÃ, apareceron os spammers e os contidos SPAM que xa se extenden por redes sociais, correo electrónico, foros, blogs, etc.
Para filtrar e eliminar os contidos SPAM es necesario contar con software ou servizos que permitan a sua detección. Na actualidade, a eliminación de contidos antispam distribúese como un servizo. Actualmente resulta habitual e efectiva a contratación de servizos de filtrado antispam que se compoñen dun software ou hardware especÃfico de filtrado e dun servizo de actualización do comportamento do filtro que permite a adaptación ás variacións que se poden producir nos correos distribuÃdos.
Na actualidade, estes servizos de filtrado confórmanse mediante a utilización dun software SpamAssassin que, polas súas caracterÃsticas, permiten o modelado do comportamento do filtro de forma dinámca e a súa distribución destes filtros ao software de filtrado instalado nos clientes. A posibilidade de modelar os filtros de contidos foi, sen dúbida a caracterÃstica máis valorada de SpamAssassin que motivou que esta solución fora adoitada incluso por grandes empresas como Symantec (Symantec Brightmail) ou McAfee (McAfee SpamKiller).Xunta de Galicia | Ref. 08TIC041EXunta de Galicia | Ref. 09TIC028
A practical approach to protect IoT devices against attacks and compile security incident datasets
The Internet of Things (IoT) introduced the opportunity of remotely manipulating home appliances (such as heating systems, ovens, blinds, etc.) using computers and mobile devices. This idea fascinated people and originated a boom of IoT devices together with an increasing demand that was difficult to support. Many manufacturers quickly created hundreds of devices implementing functionalities but neglected some critical issues pertaining to device security. This oversight gave rise to the current situation where thousands of devices remain unpatched having many security issues that manufacturers cannot address after the devices have been produced and deployed. This article presents our novel research protecting IOT devices using Berkeley Packet Filters (BPFs) and evaluates our findings with the aid of our Filter.tlk tool, which is able to facilitate the development of BPF expressions that can be executed by GNU/Linux systems with a low impact on network packet throughput.Xunta de Galicia | Ref. ED481B 2017/018Ministerio de EconomÃa y Competitividad | Ref. TIN2017-84658-C2-1-RXunta de Galicia | Ref. ED431C2018/55-GR
An ontology knowledge inspection methodology for quality assessment and continuous improvement
Ontology-learning methods were introduced in the knowledge engineering area to automatically build ontologies from natural language texts related to a domain. Despite the initial appeal of these methods, automatically generated ontologies may have errors, inconsistencies, and a poor design quality, all of which must be manually fixed, in order to maintain the validity and usefulness of automated output. In this work, we propose a methodology to assess ontologies quality (quantitatively and graphically) and to fix ontology inconsistencies minimizing design defects. The proposed methodology is based on the Deming cycle and is grounded on quality standards that proved effective in the software engineering domain and present high potential to be extended to knowledge engineering quality management. This paper demonstrates that software engineering quality assessment approaches and techniques can be successfully extended and applied to the ontology-fixing and quality improvement problem. The proposed methodology was validated in a testing ontology, by ontology design quality comparison between a manually created and automatically generated ontology.Financiado para publicación en acceso aberto: Universidade de Vigo/CISUGXunta de Galicia | Ref. ED481B 2017/018Xunta de Galicia | Ref. ED431C2018 / 55-GRCMinisterio de EconomÃa, Industria y Competitividad | Ref. TIN2017-84658-C2-1-